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ABSTRACT 

To test the applicability of tnultidimensional ratings 

of writing effectiveness that are amenable to normal classroom usage, 
ail grade 7 students (Nffl39) from qn^ (Sydney^ 
Australia) wrote a brief essay* Master and student teachers evaluated 

all the essays according to qyeraii effectiveness of written 

expression and according to holistic ratings of specific components 

(mechanics, sentence structure » organization^ word usage, 

cbhteht/ideas^ and style); Ratings of writing effectiveness by master 
teachers and by student teachers were substantially correlated with 
each btherahd with ah exterhalvalidity criterion^ Correlations were 
particularly high fbr the sum of ratings bf specific cbihpbhehts, but 
were nearly as high fbr overall, glbbal ratings. The sihgle*^rater 
reliability (r=p.7), the average of cbrrelatibhs between each pair bf 
raters, was higher than_ expected from previous research. The average 
of single-rater reliabilities for specific cbinpbhehts (rs0*6) was 
also high. However, the predicted ability of teachers to discriminate 
among the multiple components^ except perhaps the mechanics facet, 
was not supported in a variety of multitrait-multimethod analyses. 
The student teacher ratings were nearly as reliable and as valid as 
master teacher rati. ngs, and student teachers were prehaps better able 
to differentiate among different components of writing effectiveness. 
(PN) 
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r*iu 1 t i d i mens i anal f^va 1 aal i unt. uf Wr i t i ri«4 Ff f ec t i vehe'jiv^ 

Thit? titijec. tive Qf ari essny t:esi i *.» tu (. ♦/r ui i rii:' whtW h%»r- u «^'. uiiiTit 
is able la wr i ».e a l. 1 ear aiui »_^f f fJt. t i we i^sv.av un ti ^Jivf.Mi t.upi< - Th*/ 
asi^fiessihien t af an efb«5ay iiu5>- fcicas tin wr i t i ri\j ^--f f ♦-l t i nr^uii ur wrj Hic* 
level uf ri(- h i eveineri t in a Luriteiit area. While i Me?i»e litu utie*:. of e'^'S^iy 
testih'^ have ^^ume cuinmun fedtur t-v,^ i* i » ^ i jnpur tant io d i s ». i rt^ Hi i 
between them. T^le fbcu?:^ u f the pr e*ieht tliciy i *> (in » fie evnluat itiri uf 
effective writing, but as Fuley < I I ) ^uyyet^tv^, the jack uf ..i 
clefiriiticjh of "ydo<l writiriy" mafce?^ the ta?.t< difficult. MuGt research 
i rj this area employs ho 1 i s t i t: / i mp r evis i on i s t i c ratin^j^j where r-ateru furiii 
a single?^ overall impression, but there is r-esearch which u«:ses 
aric^lytic pr-Qcedares or a techriique wfi i c h combinevj ai>pectt. of l^uLh 
approaches. I ri a pur-el y analytic approach objective mea^^Ureis uf 
language prodaction (e.g., number^ of words, wordfi per clause, ratio of 
subordinate clauses to tot. al clauses, spelling errors, etc.) ar e 
measured or counted. A hybrid technique, r ep re»:ier» t i ng a compromise 
between the two appr-baches, is to bbtairi global ratings on specific: 
camporients hypothesized to underlie overall writing ef f ec t i venens 
(e.g.j rnechanics, br-gah i z a t i bri , style). 

Traditionally, high school Erigli^Sih teac fjers evaluate writing iri 
twb ways. First, they "correct" an essay arid provide varying arnoutits 
of formative feedback to the student writer-. T^iiia tasP typically 
involves some for^rn of the analytic: or- hybr-id approacfj. Sfc. und, tfiey 
assign an overall, summative evaluatibri (a mart: br^ a grade) tcj f )ie 
essay, which is generally a holistic rating. Harris (1977) found that 
content and organization, as bppbsed lu rnechiahics, were rhbre impbr tarit 
in determining overall ev 1 1 ua t i tjns of writing samples, but thot 
written, formative feedback to students emptiasized riiec hah i c . Freed man 
(1979) e;«: per i meri ta 1 I y manipulated essays and found that while contertt 
ar^d organization were mbst important to the det eriiH hat i bri of overall 
evaluatioris, mechanics and sentence structure also had some i r»f 1 aence . 
She suc^gested that th»e r-elative irifluerice bf differerit cbniporierj t s might 
vary deperiding upon the rater, or the type or purpose of essay, and 
this rnight cbh tribute tb the unreliability bf overall impressions. 
Chase (1968, 1983) argued that even when raters are sp€?c i f i c a 1 1 y 
instructed tb ignore factors such as quality of hsndwritirig and 
mechanical errors, they are appareritly unable to do so and that their 
overall r^at i rigs r^efleCt e xper i meri t a 1 I y niarMpalated effect*^ due to such 
i nf 1 uences . 
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wetting Effectiveness if 
Quelimaizi Capell and Chbu (1982) argiie that t^iere are generically 
distinct riietKbds of writing for particular purposes such as narrative 
and expository essays. They found that overall ratings wer e 
cbhsistehtly lower for narrative essays than expository essays and that 
the rater agreement on two different essays written by the same student 
was higher when bath essays were exposal tury or bath were rtarrative than 
when one was expository and one was narrative. However, their* results 
also depended on the particular component of writing ef f ec t i veriess 
that was being assessed. For example, mechanics was the compuhent of 
Mi-itiiig e+-fet:t i venest; that mo^t clearly d i i rt<jM isihe<i , and rater 

agreenier»t on two different essays fur HeLhaiiics did hut (leperiU or* 
whether both essays were written for the same purpose or- for different 
purposes. tfie authors concluded that their f i r»d i hgs challeriged the 
assumption that writing effectiveness is a un i d i mens i una I construct, 
and argued for the deve 1 dpineri t of specific comporient'J uf wr i t i r»g 
ef f ec t iveness- 

S^ Hql e Ra^ef^ Rel iabi I i t i e s Wit h Hoi i s t i C / I mp r ess i un i s t i c M^rEillS^ 
Sources of error in evaluating essays include: 

1) Student error chance f 1 uC tua t i uris in the perfarmarice uf t^^e 
student which are not stable over tirne; 

2) Test error -~ Since a writing sample can be considered a une-item 
test based on only a limited sample of relevant be^^avidr', individual 
Students may perform better or worse on different, equally apprapr-iate 
essay topics* 

3) Scale error idiosyncratic ways in which a particular rater uses 
the given response scale when evaluating ah essay; 

4) Rater error — error due to disagreement in ratings of the relative 
quality of the same essay by different raters; 

5) Writing Purpose Error alternative writing tasks (e.g., narrative 
and expository) may tap different cbrnpbhehts of writing effectiveness 
and performance on one task may not generalize to the other. 

The focus of this study will be on the rater error, thoagh it is cJear 
that erch source of error and interactions among the different sources 
can be substantial (Breland fe Gaynor, 1979; Coffman, 1966; 1971; 
Fr^ehch, 1966; Moss, Cole Khampalkit, 1982; Quellmalz^ Capell fe Chbu, 
1982). The reliability of essay evaluations, ever; when cons i der*a t i on 
is 1 i m i ted to rater er r or ^ var i es s ys temat ically and p r ed ictably wit^i 
the number of raters (e,g, , Coffman, 1966; 1971) . Consequently, for 
purposes of this study^ the single^rater reliability will be defined as 
the correlation between two ratings of the same essay each performed 

4 



Wr 1 ting Ff -f ec t i veness 3 
independently by two separate i nd i ^/ i daa 1 s , nr- the avor aye CLir- ff* I a t i liri 
between all pairs of raters when there are? rtiufe thiari two niartors. 

Hail (1972) i in a review uf reiit^arcfi corulaLted in the US, ^'nylarid 
and fllLstralia prior to 1972, L ciric I licied that s i riy 1 e- rater r e 1 i ab i 1 i t i e?i 
of about 0.60 appears to repretiont "the limit of- the exti^rit uf 
agreement one can generally expect between sihglt^ j udt»e^ rjuir U : r»g cirir' 
essay" (p, 32). C off man (1966) reported that th*s? c or re i at i an betweeri 
response*^ hy two raters (i.e., thie '3ir»gle rater reliability) to the 
same short essay w.^s about 0,38" thotigh the r el t. ability of the suin of 
responses by five raters was U-76. Huddlestbn (195^) reported that 
highly trained examiners for Engli«^h compos i t i uns on the Col lege Board 
Examinatidn were able to achieve a single-rater reliability of about 
0. 55 for a long paper on a single topic, Dieder ich (197^;) reported thiat 
"even after wording with ah Ehgli*iih staff for some time, I have rarely 
been able to to boost the average correlation betweerj pairs of r-eaders 
above 0-50, and other exanii rier-s tell rne ^that thiis is about whiat they 
-Fjet" (p. 33). French (1966) suggested that with extensive training and 
monitorir*g, th& s t ng I e- r a t. er- r-e 1 i ab i 1 i t i es could be as ^ii9^^ as 0,70, 
but that when untrained raters from various academic disciplines wer^e 
asked to evaluate essays according to their- own judgrhehts of whiat 
constitutes writing ability, the singte^rater r-el lability was only 
d- 31 - 

In the same study F^rench (1966) reported tfiat the s i ng 1 e- r a ter^ 
reliability for English teachers was 0.41, and was appreciably higher 
thian for- the gr-dup as a whole, Thompson & Ba i 1 es (1926) reported 
single-rater reliabilities of 0.65 for e^xperience teachers and 0-50 for 
untrained students. Michael, Cdbper , Shaffer and Wall is (1930) also 
found that the single-rater reliability far English professors was 0.64 
and 0.85 on two essay tDpics, while cdr r-espbhd i hg \/aiues based upon 
ratings by faculty f rom other disciplines were O - 56 and (D . 64 . However , 
Phillips (1948) had essays graded by 77 practicing teachers and 373 
education students who were not teachers and found that: thie single- 
rater reliability was 0.43 for- teachers and 0,41 for hdn- teachers . 

Har fc i h (1 983 ) described procedures used in the corporate 
holistic marking of the New South Wales (Australia) Ehgli^ihi reference 
test which is completed by 75,000 year 10 students each year. Pr ior to 
mar k i ng , sen i or exam i ner s se lee t ■ rarrge-f i nder " essays which are used 
to define each of the categories on the 15-point response scale used to 
evaluate essays. Examiners are brought to a single location, bi iefed 
on the use of the range finder essays^ and given considerable training 
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and practice before the actual iharkihy exerci^l* its begun, Dur i ri<j the 
marfcirig, examiners each worU withiri at ^roup of threes and »ue> 
encoaras^d to converse with other mernbers ijf hii^/her tearh when 
questions arise, though the actual ratings are made by a ^ing it- 
examiner. Dur ing the marking operatlun the nieah , standard d':?v i a t i oh , 
and reliability estimates of ro^^pon^^^s by each marker are tabulated on 
"a daily basis, and additional can-Ju 1 tat i on with sehior €?xarniners occar s 
when necessary. Using this sy^stem random ^arnple^ of e^si/y^ were each 
graded by multiple markers and Hsrkiri reported a nieart ?i i ncj 1 e - r at er 
reliability of 0.80. However, IJri i ^ value is probably a somewhat 
inflated estimate of the cbrrelatic3h between two rat'^rs gradirjg the 
same essay when each is working striccly independently. Even higher 
estimates of reliability were bbtaihed with samples of the e^^says 
specifically selected to uhamb i guous 1 y represent each of the 15 sciale 
points^ although these essays are solected to monitor the markir>g 
operation and not to provide an unbiased estimate of single rater 
r e 1 i ab i 1 i t y . 

In summary, single rater reliabilities generally vary between O.'J 
- and 0.8 depending lipdh the length and the topic of the essay, the 

amount of freedom students have in selecting and responding to the 
essay, the experience of the raters> the extent of training given the 
raters, and the control exercised in mdnitorihg and standardizing the 
rating ehv i r dntnen t . The single rater reliability for* short, in-class 
essays marked by classroom teachers tend to be substantially lower thafj 
estimates obtained in large, corporate marking studies. 
Compon en ts e-F Sciiias Ef- f e c t i venejT iS^ 

French €t96^) sawa:rized an ^tte«pt tu derive ca»pcinehts of 
writing effectiveness from the comments from readers of essays. The 
comments were classi -ied into 55 categories and submitted to vac tor 
analytic techniques. French identified five factors representing 
Ideas, Form, Flavor, Mechanics, and Wording- Foley (1971) > using this 
and other research, argues that writing effectiveness can be 
categorized into f ' ve major components; tdeasj, Organ i zat i an , Style, 
Mechanics, Chciice of Words. However, much of this research is based 
upon inferences based upon written cbniments or a logical analysis of 
the writing process, rather than upon the determination of whether 
raters a^-e actually able to distingui^ih between these cbmpbrien t d - 
Studies by Cast (1939), Hartog (t94J.) , Moss et al . (19Q2), and Smith 
(1979) each sugge-st that a general factor of Writing effectiveness 
Underlies ratings of specific compPnents^ 

ERIC 



Writing E-f -f ec t i veness 5 
Smith C1979) compared i ihpr es?i i cin i ss t i u / hij 1 i t i c ratiri^o, ratinyri uh 
SIX speci-fic cbmpbhehts ( + bcu^, dt?v€? 1 opint'M t , or ^jari i zat i an ^ L.utipurt, 
paragraphing and mechanics) ^ and ub j eti t i v'? tii^t i^core^ de':^ i cjrie'J tb 
measure the same six c dtripdhen t ; Total 'ocare^ f(jr the halistK arid 
rating scales were substantial^ and cur re 1 d ted U . arid 0.6) 
respectively with t^ie objective total scoro, Huwevt. r , rat in r-. amang 
the six speci-fic scales were highly correlated, rangirig -f rH.fT.'i U.oV to 
C.90, and were highly correlated with the total o-f the ^peci^ic 
r-atings, correlations r-anging from O - 82 to 0-96- While "Juiitli L.ohcladed 
that it was tempting tb ih-fer- that the speci^^ic ratirig scale^.^ actually 
tapped a single anitary diwension of writing ef f crct i vehess, iih*- 
suggested that distinguishable subscales may emerge when the wr iting 
task is less structured- She also found some support for rater' 13 
ability tb distinguish Mechanics frotn other cotnporients of writirig 
ef f ec t i veness - 

In a technically sophisticated study, Qaelluialz, Cape 11 and Chuu 
(1982) compared ratings of general impression, ratings bh four specific 
carnponents of writing effectiveness, and objective test scureti designed 
to measur-c? three of the four specific cbmpaheriti^ - The specific 
components were four of the six employed in the Smith (1979) study, and 
the scaring systems used in the two <^friidie<5 were similar- Qdellmalz et 
al - , however, examined writing effectiveness ?or expository essays^ 
narrative essays, and for a paragraph writing task- Althaagh a wide 
variety of analyses are reported, the most relevant to the preser>t 
i h ves t i gat i bri was the mli 1 t i t r-a i t -mli 1 t i ihet had analysis of specific 
ratings of the essays (their step 1 in table 4, p- 253)- While their 
analysis argued fbr the existence of three d 1 st i rigu i shab le facets, 
Coherence, Support and Mechanics, cor re 1 c3 1 i ons among these trait- 
factbrs varied from 0.63 tb 0 . 80 . Although not reported, the authort> 
i radicated that the correlations among the components were everi larger 
when results frbiti the paragraph writirig tasU and/tjr the objective test 
scores were included in the analysis- As with the Smith study, the 
Mechanics Cotnpdhent was most distinct; The authors argued that 
"further examiriation of the value bf rating writing accbrdihg tb 
separate component features shoal d cor^sider both their diagnostic 
utility and component distinctiveness- (p- 25o ) , and is cbhsisteht with 
the aim of the present investigation. The stcdy also defnorjst r t _ d the 
importance of conf ir mat or y factdr analysis bf MTMM data in the study of 
multiple dimensions of writing effectiveness- 

Intuitively^ evaluations bf effec*^ive i itihg seem to Ue 

o ^ 
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Writing Effectiveness 6 
mul tifacetedi and the stractare outlined by Fbley <i97i) provides a 
wel l^cbncei ved j theoretical basis for what the different facets rifiyht 
bei If these components can be reliably d i f f eren t i ated j then the 
evaliiatibh of each cbtnpbnent separately has several passible 
advantages^ particularly in the typical classrobm setting: 

1) Feedback to Students. Scores on thG separate catripanents^ in iiddition 
to written comments, and perhaps^ an overall mark, will provide 
students with more detailed feedback which is farntative in nature- 
This is particularly important, if^ as Harris suggests^ forrriativo 
feedback traditionally emphasizes different components than dnes 
overall, summative assessments. 

2) Definition of Effective Writing. Effective writing is difficult to 
evaluate, partially because there is no operatidnal definition of b^hat 
cdhstitutes effective writing- Th^ succressfut application of the^e 
categories would provide a better definition of what is meant by 
effective writing, and how this differs far different kinds of writing, 

3) Reliability, An aver" age rating across several cdmpbhehts may be 
more r-eliable than is ah overall global assessment, particularly if 
part of the disagreement among raters is due to the relative emphasis 
placed on the different components. 

4) Validity. Imprdvir^g reliability may improve validity as well. 
Furthermore, the optimal weighting of the different componerits may 
vary, depending upon the criterion of effectiveness, but this 
information is lost if only an overall assessment is used. 

55 Bias. Variables which may bias ratings of writing effectiveness are 
likely to have a larger impact on a single, ill-defined, overall 
assessment of writing effectiveness than bh separate, thbre harrbn ly 
defined components. 
The P res eh t Stiidy ■ 

The present study is designed to test the applicability of 
ftiu 1 1 i d imens icna 1 ratings of writing effectiveness which are amenable to 
normal classroom usage ^ rather than to determine what might be po&sible 
in an ideal sett ing. It is x mpor t^n t ta note that raters were 
specifically not given extensive training in the rating task ^ that the 
ratings were hot made in a highly controlled setting, that the raters 
had no chance to discuss the task with each other or the researchers, 
and that the constraints on the task for student writers and for raters 
were not specifically designed for purposes of this study. The rating 
tasks were relatively unstructared and teachers were encouraged to ase 
per-spec t i ves they typically employ in their own practice. 
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Ratings d-f- multiple cdmpdrierits arid an a vera 11 eva 1 uat i on we.-rf? made 
by bath master arid student teachers. Two pr oceUUreLi werk^ iji:,ed i ri the 
analysis. First j overall ratings and tatal scares <]€?f i\/ed f r am tiie 
cdmpdheht ratings were dbtaihed- Siritjlo rater re 1 i ab i 1 i t i e«J wor 
determined^ and the ratings were correlated with ati external validity 
criterion. Second, mu I t i t r a i t -nni 1 t i met Had aha 1 yiiet"* were orhployed to 
determine if the teachers were able ta differentiate amonq the 
hypothesized camponents of writiriy ef ec t i verier ^ - It. wa^ jirediclf/d 
that : 

1) the single rater reliability of responses by liiav, ter teilLfierL. would 
be about 0-5 for overall i !np r ess i ons ^ and somewhat higher for rati rigs 
based upon the sum of ratings of specific compdnerits; 

2) validity estimates would also be somewhat higher^ for^ total scores 
than for overall i mpress ian?5 ; 

3) both reliability and validity estiinates would be somewhat lower for 
student teaichers; 

^) master and student teachers would be able to differentiate ar^ong the 
different rating componen t n , Chat the differentiation would be better* 
for more objective components like mechanics^ and master teacher would 
be better able to differentiate among t^^e comporierits. 

Method 

Samp 1 e and Pr ocedures > 

Students consisted of all 139 seventh grade students attending one 
public, coedacat i onal high school in suburban Sydney, Australia. 
Virtually all students were native English speakers and were born in 
Australia. The students were somewhat brighter than average, as 
indicated by the mean IQ of 106 obtained from ^heir school records. 
The sc3c: 1 oeconcm ic status of the geogra?:hic: areas serviced by thio 
schddl varied frdm wdrk i hg class to upDer c 1 aci , though the mdjurity of 
the students came from middle class backgrounds. 

Early in the academic schddl year, all si^vehth ^rade students were 
asked to write a brief story of one or two pages about one of three 
possible subjects a chase, ah animal, dr a game. T^^e choice of the 

subjects was up to the student. (Wiseman Wr^igley, 1958, demonstrated 
that allowing children to select a topic had little impact an errors in 
marking.) Instructions were read aloud to ail students^ but oncf? they 
ac taa 1 1 y began writing, they were given no help or assistance. Herice, 
the task which is the focus of this study is similar to the school 
performance test described below. The completed essays in this study 
varied in length froth about 100 words to about 500 wo>ds- 

ERIC 



Wr i t i ng Ef f <?c t i veheBS 9 
«;pecific components^ and a 1 tjave ah Liverail t* va i uat i bri • v he*/ wer U 
given the fol lowing desc r i p t i oriri nf the coinpanont «::> : 

i i .MECHANICS, _( e. 9- j ^P^ALLn^* cap i ca 1 i zat idft , pur ic tuat i un j cimniiir , 
tense, sab j ec t -verb agreement; etc. ) 

2) SENTENCE STRUCTURE (t.y., use of cbniplete sehtehLes, app r irjpr i a t<^ use 
of phrases /c 1 auseh , wdru drclerj variations i ri ^itructure, etc.) 

3) WORD-USAGE (e.g,, fiaency, appr upr i at erie?;s , selection, r arKjf ui 
usage, level of usage, etc. ) 

4) QRGANiZATldN (e.g., adequate i n t r ud uc t i ur» 6^ erut i riy , 1 iiy i c a 1 ru <ier > 
par agraph / theme structure, caheronce, ernpihiasis, t raris i t i an , etc,) 

5 ) CONTENT / 1 pE AS ' ^ - 9 • > ^ 1 e y a n c e t o t op i c ^ c oinp r h e n s i L> i 1 j t y v. I u<-3 i c , 
C 1 ar i t y , __ app r op r i a t e ex p 1 ana t i o n and sarnmar i za t i ari , relevant 

ar gamen t s / examp 1 es , etc,) 

6) QUALITY OF STYLE (e.g. J original it , creativity, f lavL.r, ira.crev. t 
value, freshness, individuality, etc.) 

7) OVERALL EVALUATION (This judgment should be made accdrdihg tu yuur 
own cr iteriaandshould represent ydur aw n subx^^ii^^ i P s i o u . It 
may or may not reflect the -first six criteria, and may also reprei^ent 
other characteristics that you feel are inipnrtant,) 

Teachers were asked td make each df^ their ratings dh a nihe^poirtt 
response scale vahich varied from "t-Very Pour" to '•9-Very Good", and to 
adhere to standards df quality that they f-elt were appropriate far- yer^r 
seven. The teachers were asked to make all rating«a fur* each es^.ay 
after a single reading (i.e., they were hdt asked td reread the C^et df 
essays separately in order to make each rating) ; 

Three university students, who were in the process df completing a 
degree in Education which would qualify them to teach English i r» 
secdridary -^chodls, were alsd asked to evaluate thit? essays. 1 hie 
s t uden t - t eac h er s were selected by a university lecturer as beinc; good, 
responsible students in the teacher educatior^ prdgram. Hdwever, except 
-for practice teaching, these s tuden t ~ teacher s had had np actual 
classr^dbm teaching experience. The student teacher*s were given exactly 
the same set of instructions as the master teacher^s and were requested 
to evaluate the essays according to the specific cbmprihehts df writing 
effectiveness and to provide? an overall evaluation, but they had nut 
made early ratings lO rronths prior to this task as ^iad the master 
t eac her b . 

fhe -following Set of Scores; derived frnm the prbcedures described 
addve, was computed for C'ach of the 139 students who completed essays 
for this study: 

Validity Criterion — 1 score based upon school performance on tFie 
essay test administered by the" school. 

Ear- I y Ratings 3 scores^ one -from each Master teacher, which 

represent global, holistic impressions of essays ih this study. 
Cdmpdneht Ratings — 36 scdreSj six from each of the three student-- 
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teachers and six frbm each df the three msister - t*?acher!S ^ repr«.?uen t i ricj 
scares Gn the specific components used in fc?Vri Uiat i hy the ei:,iiay«::i in thi^s 
study. 

Cverall Ratings 6 scares, one frdm each lif the ieauhers, 

representing global, holistic impressions uf tho eri^^ay*:^ used in iJi i 
study at the time of the second rating- 
total Ratings 6 scores, one fiOm eisch of the teachers, r ep r eviori i i ncj 
the sum of scores on the six cumponent ratings (but not the overall 
rat i ng ) - 

•n addition to the 52 scares described abbve^ nine scares fcir each 
essay were obtained by summing acrd^r.s the responses by the three master 
teachers for the early ratings, the six component ratings, the overall 
ratings, and the total ratings. Eight cor respond i ng scores *Nere 
obtained :>y summing across responses by the three s tuclent - teacher s fur 
all but the early ratings ( s t udeh t - 1 eac her s did not make varly 
rati ngs ) . 

S^a^li+eaJ: and HUJ_t j^tra i t-mu 1 t i meth od Anal yses , 

Correlations among various sets of scores were used tu determxne 
the single rater reliability and the validity of the overall and total 
scores. However, an impcirtaht aspect of this study was to determine 
the extent to which teachers Can differentiate among the varioiis 
components of writing effectiveness described above. Multitrait- 
multimethbd (MTMM) analyses, where responses by different teachers 
correspond to methods of evaluation and the specific components of 
writing effectiveness correspond to different traits^ is ideally suited 
to this purpose. In MTMM analyses the distihctioh is made between 
convergent validity, the agreement between different raters on the same 
component, and divergent validity, the ability of the raters to 
differentiate amOrig the different components. Hence^ the convergent 
vai id i t ies in mu 1 1 i t r a i t -mu 1 1 i met hod analyses, are really s i ^ e rater 
re 1 i ab i 1 i t i es in this particular application. This distinction is 
important in the? interpretation of the findings, but in no way affects 
the actual procedures in conducting MTMM analyses (for further 
discussion of this distinction see Marsh, Smith, Barnes t Batler, 1983; 
Marsh, Barnes, Hacevar> in press). Three approaches to HTMM analyses 
are briefly summarized below^ but ah extensive review of the procedures 
is beyond the scope of this paper and the interested reader is referred 
to Marsh and Hocevar (1983, in press; also see Kenny, 1979; Schmitt, 
Coyle, fc Sarri, 1977). 

Campbell and Fiske (1959) argue that the demon s t rat i an of 
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cdnstruct validity requires both convergent and cl i scr imi nant validity; 
that iSi multiple indicatbrs of the Siime campbrient of writing should be 
substantially correlated with each other, but less; correlated with 
indicators of other components. Convergent validity is inferred from 
agreement between measures of the same component of writing 
effectiveness assessed by different teachers. Discrimihaht validity or 
"divergent validity refers to the distinctiveness of the different 
traits^ and in this case is inferred from the relative lack of 
correlation between different components of writing effectiveness. 
Campbell and Fiske proposed four guidelines for evaluating MTMM 
matrices. These guidelines have been criticized, but they are still 
represent the most frequently employed strategy, are useful, Uhd are 
recommended as the first step in analysis of MTMM data (Marsh tc 
HQcevar, 19835 in press). 

An model (kavanagh, et al., 1971) provides a more analytic 

approach to MTMM analysis. When repeated measures of cases — the 
essays i r; this application ar^e measured over all levels of traits 

(the ratir.q compohehts) and methods (the teachers), three sources of 
variation can be identified. The main effect of essays is a test of 
how well the ratings discriminate between essays, and is taP:en to be an 
indication of convergent validity. The essay-by- trai t interaction tests 
whether differentiation among the essays depends Upon the specific 
components of writing effectiveness; i? it does not then the components 
h^ve ho discriminant validity. The essay-by~ teacher interaction tests 
whether the differentiation depends upon teachers; if it does the the 
different teachers introduce a Source of Systematic (undesirable) 
variance which is taken to be an indication of method/halo effect- 
Kavanagh, et al. (1971; also see Marsh Hacevar, 1983) describe 
procedures whereby these effects and corresponding variance cdmpr.nehts 
can be obtained directly from the MTMM matrix, and these are employed 
in the present application. However, despite the cdhveniehce of 
statistical tests and summary statistics, this procedure has important 
limitation5> the effects tested with this model bear hd straight- 
forward correspondence to the interpretation of convergent and 
d i sc r i m i hah t validity as used in dther MTMM analytic strategies, and it 
is recommended only to supplement the application of other approaches 
(Marsh U Hdcevar, 1983;. 

Confirmatory factor analysis (CFA) has more recently been applied 
to the analysis of MTMM matrices, MTMM matrices, like any other 
correlation matrix^ can be used to infer the underlying dimensions that 
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are being measured. In the present application, factors defined by the 
measures df the salme component of writing effectiveness support their 
construct validity, while factors defined by different components rated 
by the same teacher argue for method/halo effects. 

Convent idnai /exp Ibratbry factor analysis, because of the t nde term i nancy 
df the solution and the researcher *s relative lack of ability to define 
a model, is generally inappropriate for analyzing MTMM matrices- With 
confirmatory factor analysis, the researcher is able to specify 
different models and td determine how well these various models fit the 
data- Hence, the analysis of the MtMM matrix can be viewed as c 
straightforward application of cdhf i rrnatcr y factor analysis with a 
priori factors corresponding to specific methods and traits, and the 
findings can be interpreted in the same way as can other confirmatory 
factor analyses. 

In the pireseht application, the CFA was conducted with the 
commercially available LISREL V program (Joreskog Sdrbam, 1981). The 
mdst general MTMM mddel employed in this study consisted of 1*2 factors 
representing the six components of effective writing (traits) arid the 
» six teachers (methods). Each of the 36 measured variables was used to 

define one method factor and one trait factor while loadings dh the 
dther 10 factors were fixed to be zero. For example, ratings by the 
first teacher of the Mechanics cdmpdnent was used td define the method 
factdr fdr the first teacher (along with the other five ratings by the 
same teacher) and the Mechanics trait-factor (aldng with the other five 
ratings df Mechanics by each of the other teachers). Hence, the 36 
measured variables were used to define 72 factdr Idadihgs, and all the 
other factor loadings are defined to be zero. The 15 correlations 
:among the six method factors and the 15 cdrrelatiOns among the six 
trait factors were estimated in the analysis, but correlations between 
method and trait factors were fixed td be zero. The 36 

error/uni que n esses , one for each measu red var i ah le, were defi ned sd as 
td fdrm a diagdhal matrix sd that the error ternts were uncor re 1 ated . 
This pattern of loadings represents the standard model used in the 
analysis df MTMM matrices (see Mar^h & HoceVar, 1983; in press). The 
fit of this CFA model to the data was assessed by the magh i tude df the 
parameter estimates, the ratid df the chi-sqaare to the degrees of 
f reed am in the analysis, the root mean square df the residual 
differences between the observed and reproduced correlation matrices^ 
and coefficient d which scales the dbserved chi-square along a scale 
df zerd-to-dne where the end-points represent a null fit and a perfect 
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fit (see Behtler & Bdhett, 1980; ^oreskag Sorbom, 19815 Mar^h L 
HQCevar, 1983; in press; Haruyama McGarvey, 1980) . As yet there are 
no universally accepted tn<?asqre5 oi goodness of fit in CFA (Harsh 
Hocevar, 1984), but the most nidely applied ihdicatldn is the chi- 
sqliare/df ratio where values of less than 2.0 are taken as an 
indication of a good fit (despite the relationship between this 
indicator and sample size), while the coefficient d provides an index 
analogous to measures of th^ proportion of variance explained in ANOVA 
p r dcedures . 

Over a4 1 and Total gat i hqg - 

The first purpose of this study is to deter mine the ability of 
master and student teachers to assess overall writing effectiveness. 
Single rater reliabilities, cbrrelatibns among the overall ratings and 
among the total ratings, ahd the validity coefficients (see Table 1) 
are consistently high and remarkably uniform for both student and 
master teachers. Correlations among the six total scores vary between 
b.68 and 6.78 (mean r = 0.75); correlations among student-teachers 
(mean r = 0.69), among master-teachers (mean r = 0.72), and between 
student and master teachers (mean r = 0.735 are nearly the Same. A 
similar pattern of slightly smaller correlations (mean r - 0.67) exists 
among the overall ratin^^s^ and among the early ratings by the three 
master teachers (mean r = 0.7t). Hence, the correlation between 
ratings by any two teachers, whether student or master teachers, i^ 
approximately 0.70 whether ba^ed upon total scores, on the overall 
rating, or on the early ratings which were available only for master 
teachers . 

Marks on the sc hob 1 performance essay exam i nation provi des one 
external criterion of validity against which to assess the ratings. 
Cbrreiatibhs between Master teacher ratings and the criterion are again 
close to 0.7 whether based upon total scores (mean r = 0.71), overall 
ratings (mean r =^ 0.iS9) or the early ratings (mean r = 0.68), while 
correlations between the criterion and student-teacher ratings are 
nearly as high (mean r's = 0,66 & 0.65 for total and overall ratings). 

Correlations between over a 1 1 and total rati ngs by the same person 
(e.g., Ol fk Tl) are quite high (mean r = 0.96), indicating that the sum 
of the component ratings is measur ihg a construct which is nearly the 
same as the over a 1 1 rat i ng . Cor re 1 a 1 1 ons between ear 1 y rat i ngs and 
subsequent r at i ngs by the same master teacher are also high for both 
overall ratings (mean r = 0.00) and total ratings (mean r = 0.82)^ 



Writing Effectiveness 14 
indicating that the ratings are stable over time. 

The focus here^ as well as in subsequent analyses, is ori the 
r el at 1 v e sgreernent betw<=eh different i f:^^rk^f=»n^ t^^^^e^A unnh rorrelatibhs 
among their ratings. Rowever^ the means of the different ratings in 
table 1 also provide a basis for looking at absolute differences. 
Master teachers, based upon overall ratings and total scores, assign 
somewhat lower marks than do student teachers. It i^ interesting to 
note^ however, that the early ratings by the group of master teachers 
are also somewhat lower than are the marks assigned on the school 
performance test by other experienced teachers, though the two sets of 
marks are for different tasks and n\^y not be strictly comparable (see 
footnote 1 ) . 

In summary, correlations between the ratings by any two teacher^j 
whether they be student or master teachers, and correlations between 
any teacher's rating and the validity criterion are all appirbx imatel y 
d.7d. Correlations based upon the total scores are slightly higher in 
each Pf the various comparisons, but the differences are small, 
Carrelatibns between ratings by the same teacher at two different Tiimes 
are higher, suggesting that there is a small systematic methbd/halb 
effect in the ratings by each teacher which generalizes over time. The 
similarity in correlations between ratings by different teachers in bur 
study, and between their ratings and the validity criterion, apparently 
ref lects two countervailing effects; the validity cbnr€?latibns shbuld 
be lower since they are based upon ratings of a different essay, but 
shoald be higher in that the validity criterion^ based upbri ratings by 
^twb teachers, is probably more reliable than ratings of essays by any 
one teacher in this study. 

Wultitrait-Mult i met hod (MTMM) Analyses, 

The second purpose of this study is tb determine if teachers are 
able to distinguish among the different components of writing 
effectiveness. this is examined in varibus analyses based upon the 
MTMH matrix (Table 2) where correlations in the triangular 
(heterotrai t-monomethbd ) blocks represent cdrre lat ions among the 
component ratings by the same teacher, correlations in the square 
( heterbtra i t-heterbmet hod ) blocks represent correlations based upon 
ratings by different teachers , and convergent val i d i t ies (the d i agon a 1 s 
of the square blocks which are underlined in Table 2) represent 
agreement between two different teachers on the same component. 

g ^pb e 11 - F i sfee Gu i del i ries . The application of the four Campbell- 
Fisk© gaidelines indicates: 
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1) the convergent validitiesj rarigirig -frdm 0.32 to 0:75 (median r - 
0.c»0), are all statistically significant, though those based upon 
master-teacher ratings (median r = 0.63) are slightly higher than for 
stiideh t - teac her ratings (median r = 0.55). 

2) the convergent validities are higher than other cur re 1 at i ons x r« the 
same row and column of the square blocks for only 70% of ♦itie 
comparisons, and the percentages are similar when ratings by stadent- 
teachers (74%) and master teachers (70%) are considered separately. 
None of the components satisfies this test for all the comparisons and 
the convergent validities (median r = 0.60) are only slightly higher 
than the correlations with which they are compared (median r ^ 0.55)^ 

3) the convergent validities (median r = 0.(^b) are higher than the 
correlations in the same row and column of the cor respdnU i hg triangular 
blocks (median r = 0.73) in only 12% of the comparisons based upon the 
entire matrix, and in iX% and 3% respectively when ratings by student 
and master teachers are considered separately- The median of 
correlations against which the convergent validities are compared is 
higher here than those for comparison in guideline 2 (0.73 vs. 0.55), 
suggesting a halo/method effect in the ratings of different teachers. 

4) The pattern of correlations among different components is somewhat 
similar for each of the different teachers; the highest correlations 
generally occur between ratings of Mechanics and Sentence Structure, 
and between ratings of Con ten t / Ideas and Quality of Style, and the 
lowest correlations generally occur between ratings of Cdh ten t / I d eas 
and ratings of either Mechanics or Sentence Structure. 

I n summary , the aop 1 i cat i on of the Camp be ll^Fiske guidelines 
farovide strong support for the convergent validity of the ratings, but 
not for their divergent validity. These findings suggest that while 
there is good agreement between the ratings o^ different teacher?i in a 
general sense, as was observed with the global and total ratings, 
teachers are not able to distinguish clearly between specific 
components of writing effectiveness. Surprisingly, better, albeit 
weak, support for the divergent validity of the railings carne from 
resfjonses by student teachers than by the master teachers. Also, 
ihspectidh Of Table 3 indicates that there was better 5Cupport < ar the 
divergent validity of some compdhents (e.g.. Mechanics and Word usage) 
than for that of others (e.g., Content / I deas and Organization). 

ANQVA Aft a lysis o+ %r4^ MTMM Matr i x . The results of the ANOVA model 
applied to the entire MTMM matrix^ and separately to student and master 
teacher ratings (see Table 4) are generally cdhsistent with the results 
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of the Cainpbel 1-Fiske analysis; In each of the analyses: 

1) the effect of the essays Ithe convergent validity effect) is large 
and statistically significant; 

2) the effect of the essay-by- teac her ihteractioh (the ntethdd/halo 
effect) is moderate and statistically significant; and 

5) the effect of the essay-by-campoheh t i ri t er ac t i dh (the divergent 
validity effect) is small and does not even reach statistical 
significance when the master teacher ratings are considered separately; 
Hence, these analyses also saggest good convergent validity, but the 
relative inability of teachers particularly the master teachers 

to aistingaish among the different components of writing effectiveness. 

C on f i rmat of-y Factor Ana^ysH^s (^GFA ) of the MTMM Mat r i x . Recently, 
analysis of MTMM data with the Campbe 1 1 -F i ske guidelines or the ANOVA 
modei have been criticized, and the use of CFA has been recommended 
(see Marsh h Hocevar, for an overview) - When this approach is 

used in the most general model, separate factors represent ihg traits 
and methods are hypothesized, and the ability of such a model to fit 
the data ifi quite good (i.e., model 1 in Table 5 has a ch i -square / df 
ratio of 1.6, and has a coefficient d - b.868)- However^ much of the 
variance expilicable by this model can be explained by model 2, a model 
which contains only a single, general factor (coefficient d = 0.621). 
Models 3 and 4 test the ability of trsii t-f actors without any method 
factors (model 3) and method-factors without any t r a i t -f ac tor s (model 
4) to explain the data. Model 3, hypothesizing six trait-factors does 
little better than model 2 where a single general factor is 
hypothesized (0-642 vs. 0.6215, while model 4, hypothesizing six 
method-f actors , does nearly as well as model 1 (0.8O vs. 0.868). The 
additipn of a general factor to models 3 and 4 improves their ability 
to fit the data (models 5 and 6) - 

The ability of the alternative models to fit the MTMM data 
supports the general findings of earlier analyses of the same data. 
Much of the variance can be explained by a single, general factor which 
incorporates all the cdmporieht ratings by all the teachers (model 2). 
The method-only model (model 4) explains more of the variance than does 
the trait-only model, suggesting a methdd/hald effect bot weaker 
support for divergent validity. The finding that one general factor 
can explain hear 1 y as mlich var i ance as the six trait factors suggests 
that there is almost no discriminant validity at all- 

The traditional ihterpretat ion of the CFA mddel suggests that 
method -f actor s are indicative of a bias^ while trait-factors are 
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indicative of validity. In order to test this i h terpret a t i dh in the 
present application, a 13th factor, representing the convergent, 
validity criterion, was added to model i> and the parameter estimates 
are shown in Table a. As in mociel 1 (who^e parameter e?3tirnate*3 are 
nearly the same for the first 12 factors) the measured variaUie^ loaded 
•substantially on the method factors and less substantial!/ on the trait 
factors. Furthermore, correlatibhs among the trait factors are 
generally quite high and in some cases approach l.O. However, of 
particular interest here are the correlations between the 13th factor 
(the validity criterion is labelled "V" in Table 6) and the? other 
factors- Correlations between the validity factor and the method 
factors are large (median r = 0.67), while correlations between the 
validity factor and the trait-factdrs are niuch "siha 1 1 er ( med i an r — 
0.24). Thus, at le^Ht in this application, the interpretation of the 
met hod^f ac tor s as indicating bias seems unwarranted. Instead, t^ie so- 
called method factors appear to represent a general component from the 
ratings by each teacher which is highly correlated with an external 
validity criterion. The high correlations among the different method 
factors (median r - 0-72) are also ihcbhsistent with ah interpretation 
that each of the^e factors represents a method/halo effect which iB 
idiosyncratic to the rating^ by each teacher. 

Summed Student and Mas ter Teac her Rat i ngs. Responses by the three 
student teachers were summed to form summed ratings of each of the six 
components of writing effectiveness, as were the responses by the three 
master teachers. A hew MTMM matrix (see Table 7) was formed where the 
six writing components represented traits and the two types of teacher 
represented methods. It was hoped that these summed ratings, since 
they are more reliable, would provide stranger support for the 
discriminant validity of the ratings. As expected, the convergent 
validities are quite substantial (median r = O . 84 ) . There is modest 
support for the divergent validity of ratings of Mechanics, Sentence 
Structure, and Word Usage in that the convergent validities are higher 
than other correlations in the square block (the second Gampbe 1 1 -F i ske 
guideline) and higher than correlations among thty different student- 
teacher ratings (the third Gampbe 1 1 -F s ske criterion), even though they 

_ _ _i _ _ 

are generally lower than the cbrrelatiohs among mas tG»r - tear her ratings. 

Nevertheless, even here, there is only modest support for the ability 

of ratings to differentiate among the different compohehts Of effective 

writing. 

The validity criterion and summed responses to the overall 
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ratings^ total ratings, and early ratings also appear in Table 7; 
Agreement between student and master teachers is particularly high -for 
the total ratings <r == 0,91) and somewhat higher than correlations 
involving the overall and early ratings. The total scores by student 
and master teachers are also somewhat more highly correlated with the 
validity criterion (r's = 0.76 G . 79 ) than the overall ratiriys or the 
^ar 1 y ratings, though all correlations are high and differences are 
small. Total ratings, overall ratings, and early ratings tend to be 
more highly correlated with ratings of Quality of Style and Word Usage, 
than with other specific components, but again, all the correlatibhs 
are large- These findings offer further support for the reliability 
arid validity of the ratings by master and student teachf^re^, and limited 
supipor t for their ability to distinguish among some components of 
writing effectiveness- 

DISCUSSION 

A variety of different analyses have demonstrated that ratings of 
writing effectiveness by master teachers ^hd by student teachers are 
substantially correlated with each other and with an external validity 
criterion representing actual school perfbrmahce: Agreement among 
* ratings by different teachers, -.nd between these ratings and the 

validity criteridh were par i cu 1 ar i y r i gh for the sum of ratings to 
specific components of writing effectiveness^ but were nearly as high 
for overall, global ratings- Student - teacher ratings, using a variety 
of different comparisons, were ».early as reliable and valid as master- 
teacher ratings, and s t udeh t - t eac her s seemed better able to 
differentiate among the components of writing effectiveness. 

The results of the study provide a number of surprises, 
particularly when compared with the results which were predict€?d- On 
the positive side, single r ater -- re 1 i ab i 1 i t i es and validity coefficients 
were substantially higher than expected- As expected, the total 
ratings did somewhat better than did overall, holistic responses, but 
the differences were small. Of surprise was the finding that student 
teachers did nearly as well as master teachers on most comparisons, and 
perhaps were better able to differentiate among the* different 
components of writing effectiveness. On the negative side, the 
predicted ability of teachers to differentiate among the components of 
writing effectiveness was so weak as to be of little practical value. 

the size of the single-rater reliabilities and validity 
coefficients are larger than typically found, even when raters receive 
extensive training, when essays are marked in highly controlled 
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sitaatians^ and when essays are mach longer- This demons t rat i ori is 
importaht and may reflect the fact that essay testing is nidre couimbh in 
Aastralia so that both the students and the rater -ii are more faTniliar 
with the task. This finding is also impdrtaht because it denibn^it r ates 
that raters were able to maintain a high ievel of concentration 
thrdughdut the task sd that this cannot account for their apparent, 
difficulty in distinguishing among traits; 

The apparent difficulty that readers have in d i s t i hgu i sh i ng amdhg 
components of wr i t i ng ef f ec tivene^«s is consistent with othar research. 
We know of no other research where ratt?rs ha.ve been ablt* to clearly 
differentiate among multiple component of writing effectiveness, though 
tnere is relatively little research which has employed rigorous tests 
of this conclusion . Here, as in the studies by Smith (1979) arid 
Qaellnialz et al. (19S2) where components were mdre explicitly d€?f i ned 
and raters received considerable training, the most distinguishable 
cdihfjdnent was mechariics- Alternative strategies might provide better 
differentiation among the components of writing effectiveness, bat only 
at the expaehse of the appi 1 i cab i 1 i ty . This is important since the gdal 
of the present investigation is to devise a procedure which is likely 
td be empldyed by classrddm teachers- Teachers could be asked td 
perform multiple sorts of the essays into separate piles, once for each 
Sfjecif ic cdmfDdheht. However, such a procedure wdu I d require much mdre 
time than a holistic strategy or the one employed here, and this mig^it 
hdt be acceptable in many settings. Alsdj teachers cduld be asked to 
judge four or five subcategories viithtn each of the components of 
effective writing, and these ratings cduld then be factdr analyzed to 
test the hypothesized factor structure. While this would probably 
improve the d i f f er eh t i at i on among the compbheri t s , it would also require 
considerable more time and might be unacceptable in many settings. 
Teachers coUld be asked to participate in extensive training programs 
where the rating categories are more exolicitly defined and feedback is 
provided on practice essay marking, but previous research has hot shown 
even this to produce clear differentiation among multiple components of 
effective writing. We believe that further research such as suggested 
here will demdhstrate the mu 1 t i d i mens i dna 1 i ty df writing ef feet i ven ess > 
and that the goal of this research should be to demonstrate how this 
can be best accomplished. The use df HTHM and CFA as demonstrated here 
provide an important tool for such researc.1 on writing effectiveness: 




Writing E-f ^ec t i vehess 20 

FbdtNdtes 

i The statistical sigrii+icarice of differences between student and 

master teachers was baised apon cornparisdhs of their siitrimed overall 
ratings and total scores; in each case master teacher respo.nses were 
signi'f icaf tl y lower ( t(t38) = V.28 tc 6.2i respectively, p < .001). A 
similar comparison between the sutriTned early ratings by the Master 
teacher?^ and the validity criterion based upon school perfdrmahce also 
showed that the master teacher ratings wer ^ significantly lower <t 
(138) = 9.52, p < -Oei)- However, since the essays -Evaluated in this 
final comparisons were not the same, the significant effect may reflect 
differences in grading standards, or differences in the quality of the 
essays being evaluated. 
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Correlations A 




Mean 


SD 


Total St: orgs. 






StudenFleaEhers 






Tcitall 


36i 55 


7.60 




38.09 


5.64 


Mai3 


38.il 


8.24 


Ma^iter Teachers 






TotaM 


35.04 


11.23 




33.52 


11.64 


Toiaio. 


3d.3fl 


8.63 


QvgrglJ Ratlnas 






StliilerT T'?iicliPf 5 






Over! 


6.37 


1.37 


Ovf^r2 


6.42 


1.24 


9\^?r3 _ 


6:73 


i:48 


Master TeocHers 






Over^ 


5.87 


1.90 




5.50 


2.26 


Qver6 _ . 


6;01 


i;59 








Master Teachers 






Early'' 


53.53 


17:83 


garly5 


60:68 


19.54 


rarly6 


64.48 


13.31 








SchGCil Per'f 


67.62 


12.95 



TABLE 1 

Totfti and Overall Ratings By Dif+erent Teachers 
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T5 


T6 
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71 


100 
























69 




IGD 






















72 


74 


77 


100 




















68 


69 


7? 
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too 


















73 


73 


76 
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68 


100 
















93 
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67 
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69 


69 
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70 
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66 


74 


68 


71 


67 
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65 


64 


94 


71 


71 


72 


63 


62 
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73 


77 
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78 
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70 


IQQ 








64 
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74 


96 


65 
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65 


69 


73 


100 






70 


69 


72 


68 


67 


96 


67 


70 


67 


68 


68 


100 




67 
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37 


72 


69 


66 


69 


66 


87 


67 


65 


100 


66 


68 


71 


80 


75 


68 


64 


66 


64 


79 


71 


66 


73 


71 


67 


73 


74 


67 


84 


67 


65 


71 


75 


64 


81 


69 71 100 


69 


53 


72 


73 


70 


70 


65 


62 


67 


73 


68 


67 


72 65 68 1 



Note;_fi]l_coef f icienU^_presented_withg 

significant, The numbers at the end of each variable name refer to raters 



where \\ 1\ and 3 were student teachers, arid 4, 5, arid 6 were master teachers 



2^ 
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_ _ TfiBLE_2 

MTMM: Student Teacher and Master Teacher Ratings of Su Essay Traits 



Ml SI 01 Wl CI Ql M2 S2 02 W2 C2 (32 M3 33 03 W3 C3 93 M4 S4 04 W4 C4 Q4 115 S5 05 H5 C5 Q5 M6 Si 06 M6 06 66 

Ml 100 

SI V 100 

01 59 63 10b 

HI 69 68 65 IQO.... 

CI 55 56 73 62 100 - 

91 61 57 64 69 72 100 



M2 62 55 47 52 45 45 100 

52 57 51 48 45 41 70 100 - 
44 47 42 46 46 39 63 65 100 

49 48 47 62 47 49 58 55 59 100 
38 41 34 47 41 50 43 39 47 60 100 

55 55 44 60 50 52 66 62 63 77 71 100 

58 57 48 59 40 47 53 49 37 55 41 50 100 
48 45 49 55 43 45 58 59 41 57 38 51 31 100 

50 52 50 52 5t 42 57 56 48 51 37 48 6? 67 100 

53 57 53 67 56 55 56 50 41 65 47 60 74 72 69 100 
41 40 52 51 51 51 47 43 36 44 32 46 62 66 67 70 100 
47 41 46 53 50 54 54 33 34 52 45 56 65 70 63 72 76 100 

,1 60 55 66 48 53 63 51 45 64 46 62 75 64 63 67 55 59 ^ 

62 66 58 67 49 54 63 59 51 65 44 65 74 70 64 67 52 59 

56 58 52 63 48 55 53 56 56 65 49 65 66 62 62 65 54 58 ^6 86 

59 58 54 70 49 55 60 54 58 66 48 62 72 67 60 72 57 57 °f f. 
55 55 46 65 45 57 57 49 48 62 54 68 67 62 57 66 63 57 °2 % °j % J°^,„. 

63 62 54 73 50 56 60 54 49 65 52 63 71 67 60 71 53 59 «5 87 84 91 90 100 



75 


64 63 67 55 59 


74 


70 64 67 52 59 


66 62 62 65 


54 


58 


72 


67 60 72 


57 


57 


67 


62 57 66 


63 


57 


71 


67 60 71 


53 


59 


66 


61 61 68 


50 


59 


67 


66 63 69 


52 


65 


62 


60 63 67 


53 


62 


6^ 


59 60 68 


56 


64 


58 


57 57 64 


51 


59 


63 


63 60 69 


51 


60 


62 


63 57 56 


46 


50 


62 


63 62 63 


55 


6b 



63 58 46 58 45 54 62 53 42 54 46 56 66 61 61 68 50 59 f., .^i 5^ ^2 

53 59 46 60 49 52 64 59 44 53 42 57 67 66 63 69 52 65 f- f. f. f. °° 100 

52 50 45 58 41 48 51 42 44 56 47 54 62 60 63 67 53 62 68 68 67 65 63 66 81 83 100^ 
57 58 45 67 46 56 51 42 42 58 46 58 62 59 60 68 56 64 ^3 63 72 70 74 75 76 7? 100 

53 50 44 61 45 57 52 45 42 60 53 62 58 57 5? 64 51 59 °^ °^ °° ^5 |! 
56 54 45 61 42 53 55 48 43 58 53 60 63 63 60 69 51 60 72 68 70 67 71 84 8Z 86 84 88 100 



7b 65 51 60 46 49 62 59 49 54 36 58 62 63 57 56 46 50 $6 52 56 56 61 63 66 54 58 g 5? W 

67 64 50 58 45 49 61 56 50 58 42 58 62 63 62 63 55 60 58 63 54 54 55 60 64 68 6Q 62 55 60 89 100^^^ 

53 52 52 53 46 44 48 51 58 55 38 50 51 55 62 55 50 49 5 « 5 7 8 0^^^ 

56 54 49 62 52 59 56 46 51 60 42 57 63 63 58 70 63 67 ^2 61 57 63 55 60 55 63 54 56 68 79 72 100 ^ 

42 40 49 53 48 55 49 35 58 57 44 53 46 47 48 49 55 4? | f ... 

54 48 48 -^1 49 59 54 46 51 60 46 56 62 61 58 61 59 65 ^7 57 60 52 58 55 61 55 57 71 77 73 88 77 100 

Nate: -All coeff iciehts are presented Hithout decimal points. Each variable is 
' labelled Hi th a letter-number combination where the letters stand for traits 
jM=n)echanic5^_S=5entence_5tracturej.O=organl2ation, Hwd usage, 
C=content/idea5, 8=qaalit)! of style) and huiiibers stand for raters (1, 2 and 3 
are student teachers; 4, 5, and 6 are master teachers), 

28 
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TABLE 3 



Compart sons Involving the Second Campbe 1 1 and Fiske Guideline: 
Number arid Percentage Rejections 



. Cornpar i sons _I nyo i V i hg 

Stadent Teachers Master Teachers A 1 1 _ teacher- 
Trait NX N % N % 

Mechanics 2 7 9 30 23 Iti 

Sentence Structure 7 23 8 27 33 23 

Organization 17 57 9 30 43 

Word Usage 6 O 6 20 ib iO 

Content / Ideas 20 67 18 60 91 61 

Quality of Style 3 l6 3 10 48" 32 
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TABLE 4 

ANOV^A Analyses of MThi^r Combined, Student Teachers, Master Teachers 



Cdmb i hed 



Master Teacher 



Source 

Cases 

I con vergence ) 

C X Trai t (T) 
( d i vergence ) 



df MS F-RatiQ Var df MS F-Ratia Var 
138 24.7 164.4 -68 



690 0.5 3.4 .06 



C X MethQd(M) 690 t.l 7-3 -16 
( halbJ 



e X T X M 
(error) 



3450 0.2 



15 



138 12-2 68-5 .68 

690 0.2 1-4 .bi 

276 i.4 7.6 -20 

1380 0-2 - 13 



Stadent Teacher 
d + MS ~T - R ErTo"" r 
138 10.2 35. O .55 



69b 0.5 i.6 .06 

276 1-4 4.7 
1380 0.3 



18 



29 



Note: _A1 i _ effects are statistically significant <p < .Ol) except the 
divergence effect for Master. Teachers (P > - ) - Var i ance Campahehts iKf^kr) 
are defined as described by Marsh and Hocevar (1983). 
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TABLE t5 
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Traits 


_ . 

Rc 6 Methods 
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I . 60 


- 051 


- 368 


2 


1 


Gener a 1 


Fac tar 


24 29 
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4 - lb 


- 076 


. 621 


3 -~ 
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2294 


579 


3-96 


- 073 


,642 


4 




Met hods 




1296 


579 


2-24 


. 051 


, soo 


5 


6 


Trai t s 


fk 1 General 


1776 


543 


3. 27 


. 062 


, 723 


6 


6 


Met hods 


L 1 General 


997 


543 


1 .84 


- 040 


-844 



31 
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TfiPLE 6 

LISREL Estimates Far Model _With_ 6_Methpds> .6_ Trai^ a 
i3th Factor Representing an External Validity Criterion. 
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00 
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00 
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00 
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00 


00 
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40 


00 
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00 
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00 


00 
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00 


00 


30 


Q6 


00 


00 


00 


00 
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00 


00 


27 


00 


13 
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00 


00 
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00 


00 


00 


00 


00 


00 
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00 










Factor 


Cor re 1 at ions 












mi 


m2 


m3 


m4 


m5 


m6 


1 1 


t2 


t3 


t4 


t5 


t6 







ml 100 

m2 71 ±00 __ 
m3 72 71 100 



ni4 72 73 79 100 

m5 66 65 79 77 iOO 

in6 72 72 78 67 65 IOO 

tl 00 00 00 00 00 00 100 

t2 00 00 00 00 00 00 99 100 

£3 bd bb bb do db bb 64 76 idd 

t4 bb ob 00 00 bo bb 70 74 61 10b __ 

t5 00 00 00 00 00 00 57 58 62 84 lOO 

t6 00 00 dp 00 00 00 78 79 68 93 98 iOO 

V 66 56 69 68 66 67 28 25 14 3b 17 22 ibb 



Note: The model illustrated here contains 6 method factors 
(mi ta_m6)i_6 traitfactars_Cti_tot6), and a i3thf actor 
correspond i ng to the external validity criterion (V) . It., 
differs from the design of model 1 (see Table 5) drily in that 
the external yal id ity criterion was ad de^ 13th factor. 

■E/U* stands for the error /un i queness component. 
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TABLE 7 

MTMM matrix for Sutninc-d Master Teacher srul Stader.t Teacher Rati nys, and 

Correlations with Overall Ratings, Total Ratings, and the Validity Criterion 



Student Teae+^ers 
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3 


4 




6 


7 


8 


9 


lb 


i 1 


12 


1 3 


1^ 


15 16 17 
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- STMech 


1 bd 






























2 


- STSent 


86 
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3 


- STOrg 


77 


82 


ibb 
























4 


- sTWord 


79 


SO 


77 
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5 


- STCont 


69 


71 


8b 


79 


lOO 




















6 


- STQual 


7^ 


74 


75 


83 


87 


100 
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7 


- MTMech 


87 


82 


74 


80 


68 


76 


loe 


















8 


- MTSent 


86 


86 


77 


81 


7b 


79 


95 


loo 














9 


- MTOrg 


76 


76 


79 


80 


70 


75 


85 


88 


lOO 












10 


- MTWord 


80 


78 


74 


86 


75 


81 


87 


89 


87 


loo 










1 i 


- MTCont 


75 


72 


73 


82 


75 


82 




83 


87 


91 


lOO 








12 


- MTQaal 


81 


78 


74 


85 


74 


81 


96 


91 


91 


94 


92 


lOO 








Qver^l 1 /Total Rat i nqs 




























13 


- STOver a 1 I 


88 


86 


86 


89 


89 


93 


84 


86 


81 


86 


86 


86 


lOO 






14 


- STTotal 


90 


91 




92 


89 


91 


86 


88 


84 


88 


85 


87 


98 


lOO 


15 


- MTOveral 1 


83 


80 


79 


84 


75 


82 


92 


94 


93 


95 


93 


97 


88 


89 


lOO 


16 


- MTTdtal 


85 


33 


79 


86 


76 


83 


9!3 


96 


94 


96 


94 


97 


89 


91 


99 lOO 


17 


- MTEarly 


81 


8b 


75 


84 


71 


79 


86 


88 


86 


91 


85 


90 


85 


87 


91 92 lOO 



V a I i d i t y C r i ter i o n 

18 - Essay Test 74 70 63 73 62 69 75 75 71 80 72 76 74 76 78 79 76 lOO 

NOTE; Sll coefficients, presented w i t hou t dec i ma 1 points, are 'Statistically 
significant- MT and ST refer to ratings by master teacher^s arid stadent 
teachers, which are summed across ratings by the three teachers in each group - 
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